30 research outputs found
Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification
Large Language Models (LLMs) have demonstrated remarkable proficiency in
generating fluent text. However, they often encounter the challenge of
generating inaccurate or hallucinated content. This issue is common in both
non-retrieval-based generation and retrieval-augmented generation approaches,
and existing post-hoc rectification methods may not address the accumulated
hallucination errors that may be caused by the "snowballing" issue, especially
in reasoning tasks. To tackle these challenges, we introduce a novel approach
called Real-time Verification and Rectification (Ever). Instead of waiting
until the end of the generation process to rectify hallucinations, Ever employs
a real-time, step-wise generation and hallucination rectification strategy. The
primary objective is to detect and rectify hallucinations as they occur during
the text generation process. When compared to both retrieval-based and
non-retrieval-based baselines, Ever demonstrates a significant improvement in
generating trustworthy and factually accurate text across a diverse range of
tasks, including short-form QA, biography generation, and multi-hop reasoning
Multi-Domain Long-Tailed Learning by Augmenting Disentangled Representations
There is an inescapable long-tailed class-imbalance issue in many real-world
classification problems. Current methods for addressing this problem only
consider scenarios where all examples come from the same distribution. However,
in many cases, there are multiple domains with distinct class imbalance. We
study this multi-domain long-tailed learning problem and aim to produce a model
that generalizes well across all classes and domains. Towards that goal, we
introduce TALLY, a method that addresses this multi-domain long-tailed learning
problem. Built upon a proposed selective balanced sampling strategy, TALLY
achieves this by mixing the semantic representation of one example with the
domain-associated nuisances of another, producing a new representation for use
as data augmentation. To improve the disentanglement of semantic
representations, TALLY further utilizes a domain-invariant class prototype that
averages out domain-specific effects. We evaluate TALLY on several benchmarks
and real-world datasets and find that it consistently outperforms other
state-of-the-art methods in both subpopulation and domain shift. Our code and
data have been released at https://github.com/huaxiuyao/TALLY.Comment: Accepted by TML
Graph Few-shot Learning via Knowledge Transfer
Towards the challenging problem of semi-supervised node classification, there
have been extensive studies. As a frontier, Graph Neural Networks (GNNs) have
aroused great interest recently, which update the representation of each node
by aggregating information of its neighbors. However, most GNNs have shallow
layers with a limited receptive field and may not achieve satisfactory
performance especially when the number of labeled nodes is quite small. To
address this challenge, we innovatively propose a graph few-shot learning (GFL)
algorithm that incorporates prior knowledge learned from auxiliary graphs to
improve classification accuracy on the target graph. Specifically, a
transferable metric space characterized by a node embedding and a
graph-specific prototype embedding function is shared between auxiliary graphs
and the target, facilitating the transfer of structural knowledge. Extensive
experiments and ablation studies on four real-world graph datasets demonstrate
the effectiveness of our proposed model.Comment: Full paper (with Appendix) of AAAI 202
Fine-tuning Language Models for Factuality
The fluency and creativity of large pre-trained language models (LLMs) have
led to their widespread use, sometimes even as a replacement for traditional
search engines. Yet language models are prone to making convincing but
factually inaccurate claims, often referred to as 'hallucinations.' These
errors can inadvertently spread misinformation or harmfully perpetuate
misconceptions. Further, manual fact-checking of model responses is a
time-consuming process, making human factuality labels expensive to acquire. In
this work, we fine-tune language models to be more factual, without human
labeling and targeting more open-ended generation settings than past work. We
leverage two key recent innovations in NLP to do so. First, several recent
works have proposed methods for judging the factuality of open-ended text by
measuring consistency with an external knowledge base or simply a large model's
confidence scores. Second, the direct preference optimization algorithm enables
straightforward fine-tuning of language models on objectives other than
supervised imitation, using a preference ranking over possible model responses.
We show that learning from automatically generated factuality preference
rankings, generated either through existing retrieval systems or our novel
retrieval-free approach, significantly improves the factuality (percent of
generated claims that are correct) of Llama-2 on held-out topics compared with
RLHF or decoding strategies targeted at factuality. At 7B scale, compared to
Llama-2-chat, we observe 58% and 40% reduction in factual error rate when
generating biographies and answering medical questions, respectively